<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<link rel="STYLESHEET" href="lib.css" type='text/css' />
<link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" />
<link rel='start' href='../index.html' title='Python documentation Index' />
<link rel="first" href="lib.html" title='Python library Reference' />
<link rel='contents' href='contents.html' title="Contents" />
<link rel='index' href='genindex.html' title='Index' />
<link rel='last' href='about.html' title='About this document...' />
<link rel='help' href='about.html' title='About this document...' />
<link rel="next" href="module-SocketServer.html" />
<link rel="prev" href="module-uuid.html" />
<link rel="parent" href="internet.html" />
<link rel="next" href="urlparse-result-object.html" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name='aesop' content='information' />
<title>18.17 urlparse -- Parse URLs into components</title>
</head>
<body>
<div class="navigation">
<div id='top-navigation-panel' xml:id='top-navigation-panel'>
<table align="center" width="100%" cellpadding="0" cellspacing="2">
<tr>
<td class='online-navigation'><a rel="prev" title="18.16.1 Example"
  href="uuid-example.html"><img src='../icons/previous.png'
  border='0' height='32'  alt='Previous Page' width='32' /></a></td>
<td class='online-navigation'><a rel="parent" title="18. internet Protocols and"
  href="internet.html"><img src='../icons/up.png'
  border='0' height='32'  alt='Up one Level' width='32' /></a></td>
<td class='online-navigation'><a rel="next" title="18.17.1 results of urlparse()"
  href="urlparse-result-object.html"><img src='../icons/next.png'
  border='0' height='32'  alt='Next Page' width='32' /></a></td>
<td align="center" width="100%">Python Library Reference</td>
<td class='online-navigation'><a rel="contents" title="Table of Contents"
  href="contents.html"><img src='../icons/contents.png'
  border='0' height='32'  alt='Contents' width='32' /></a></td>
<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
  border='0' height='32'  alt='Module Index' width='32' /></a></td>
<td class='online-navigation'><a rel="index" title="Index"
  href="genindex.html"><img src='../icons/index.png'
  border='0' height='32'  alt='Index' width='32' /></a></td>
</tr></table>
<div class='online-navigation'>
<b class="navlabel">Previous:</b>
<a class="sectref" rel="prev" href="uuid-example.html">18.16.1 Example</a>
<b class="navlabel">Up:</b>
<a class="sectref" rel="parent" href="internet.html">18. Internet Protocols and</a>
<b class="navlabel">Next:</b>
<a class="sectref" rel="next" href="urlparse-result-object.html">18.17.1 Results of urlparse()</a>
</div>
<hr /></div>
</div>
<!--End of Navigation Panel-->

<h1><a name="SECTION00201700000000000000000">
18.17 <tt class="module">urlparse</tt> --
         Parse URLs into components</a>
</h1>
<a name="module-urlparse"></a>
<p>

<p>
<a id='l2h-4286' xml:id='l2h-4286'></a>
<a id='l2h-4278' xml:id='l2h-4278'></a><a id='l2h-4279' xml:id='l2h-4279'></a>
<p>
This module defines a standard interface to break Uniform Resource
Locator (URL) strings up in components (addressing scheme, network
location, path etc.), to combine the components back into a URL
string, and to convert a ``relative URL'' to an absolute URL given a
``base URL.''

<p>
The module has been designed to match the Internet RFC on Relative
Uniform Resource Locators (and discovered a bug in an earlier
draft!). It supports the following URL schemes:
<code>file</code>, <code>ftp</code>, <code>gopher</code>, <code>hdl</code>, <code>http</code>, 
<code>https</code>, <code>imap</code>, <code>mailto</code>, <code>mms</code>, <code>news</code>, 
<code>nntp</code>, <code>prospero</code>, <code>rsync</code>, <code>rtsp</code>, <code>rtspu</code>, 
<code>sftp</code>, <code>shttp</code>, <code>sip</code>, <code>sips</code>, <code>snews</code>, <code>svn</code>, 
<code>svn+ssh</code>, <code>telnet</code>, <code>wais</code>.

<p>

<span class="versionnote">New in version 2.5:
Support for the <code>sftp</code> and <code>sips</code> schemes.</span>

<p>
The <tt class="module">urlparse</tt> module defines the following functions:

<p>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
  <td><nobr><b><tt id='l2h-4280' xml:id='l2h-4280' class="function">urlparse</tt></b>(</nobr></td>
  <td><var>urlstring</var><big>[</big><var>,
                           default_scheme</var><big>[</big><var>, allow_fragments</var><big>]</big><var></var><big>]</big><var></var>)</td></tr></table></dt>
<dd>
Parse a URL into six components, returning a 6-tuple.  This
corresponds to the general structure of a URL:
<code><var>scheme</var>://<var>netloc</var>/<var>path</var>;<var>parameters</var>?<var>query</var>#<var>fragment</var></code>.
Each tuple item is a string, possibly empty.
The components are not broken up in smaller parts (for example, the network
location is a single string), and % escapes are not expanded.
The delimiters as shown above are not part of the result,
except for a leading slash in the <var>path</var> component, which is
retained if present.  For example:

<p>
<div class="verbatim"><pre>
&gt;&gt;&gt; from urlparse import urlparse
&gt;&gt;&gt; o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
&gt;&gt;&gt; o
('http', 'www.cwi.nl:80', '/%7Eguido/Python.html', '', '', '')
&gt;&gt;&gt; o.scheme
'http'
&gt;&gt;&gt; o.port
80
&gt;&gt;&gt; o.geturl()
'http://www.cwi.nl:80/%7Eguido/Python.html'
</pre></div>

<p>
If the <var>default_scheme</var> argument is specified, it gives the
default addressing scheme, to be used only if the URL does not
specify one.  The default value for this argument is the empty string.

<p>
If the <var>allow_fragments</var> argument is false, fragment identifiers
are not allowed, even if the URL's addressing scheme normally does
support them.  The default value for this argument is <tt class="constant">True</tt>.

<p>
The return value is actually an instance of a subclass of
tuple.  This class has the following additional read-only
convenience attributes:

<p>
<div class="center"><table class="realtable">
  <thead>
    <tr>
      <th class="left"  >Attribute</th>
      <th class="center">Index</th>
      <th class="left"  >Value</th>
      <th class="center">Value if not present</th>
      </tr>
    </thead>
  <tbody>
    <tr><td class="left"   valign="baseline"><tt class="member">scheme</tt></td>
        <td class="center">0</td>
        <td class="left"  >URL scheme specifier</td>
        <td class="center">empty string</td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">netloc</tt></td>
        <td class="center">1</td>
        <td class="left"  >Network location part</td>
        <td class="center">empty string</td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">path</tt></td>
        <td class="center">2</td>
        <td class="left"  >Hierarchical path</td>
        <td class="center">empty string</td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">params</tt></td>
        <td class="center">3</td>
        <td class="left"  >Parameters for last path element</td>
        <td class="center">empty string</td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">query</tt></td>
        <td class="center">4</td>
        <td class="left"  >Query component</td>
        <td class="center">empty string</td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">fragment</tt></td>
        <td class="center">5</td>
        <td class="left"  >Fragment identifier</td>
        <td class="center">empty string</td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">username</tt></td>
        <td class="center"> </td>
        <td class="left"  >User name</td>
        <td class="center"><tt class="constant">None</tt></td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">password</tt></td>
        <td class="center"> </td>
        <td class="left"  >Password</td>
        <td class="center"><tt class="constant">None</tt></td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">hostname</tt></td>
        <td class="center"> </td>
        <td class="left"  >Host name (lower case)</td>
        <td class="center"><tt class="constant">None</tt></td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">port</tt></td>
        <td class="center"> </td>
        <td class="left"  >Port number as integer, if present</td>
        <td class="center"><tt class="constant">None</tt></td></tr></tbody>
</table></div>

<p>
See section&nbsp;<a href="urlparse-result-object.html#urlparse-result-object">18.17.1</a>, ``Results of
<tt class="function">urlparse()</tt> and <tt class="function">urlsplit()</tt>,'' for more
information on the result object.

<p>

<span class="versionnote">Changed in version 2.5:
Added attributes to return value.</span>

</dl>

<p>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
  <td><nobr><b><tt id='l2h-4281' xml:id='l2h-4281' class="function">urlunparse</tt></b>(</nobr></td>
  <td><var>parts</var>)</td></tr></table></dt>
<dd>
Construct a URL from a tuple as returned by <code>urlparse()</code>.
The <var>parts</var> argument can be any six-item iterable.
This may result in a slightly different, but equivalent URL, if the
URL that was parsed originally had unnecessary delimiters (for example,
a ? with an empty query; the RFC states that these are equivalent).
</dl>

<p>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
  <td><nobr><b><tt id='l2h-4282' xml:id='l2h-4282' class="function">urlsplit</tt></b>(</nobr></td>
  <td><var>urlstring</var><big>[</big><var>,
                           default_scheme</var><big>[</big><var>, allow_fragments</var><big>]</big><var></var><big>]</big><var></var>)</td></tr></table></dt>
<dd>
This is similar to <tt class="function">urlparse()</tt>, but does not split the
params from the URL.  This should generally be used instead of
<tt class="function">urlparse()</tt> if the more recent URL syntax allowing
parameters to be applied to each segment of the <var>path</var> portion of
the URL (see <a class="rfc" id='rfcref-103276' xml:id='rfcref-103276'
href="http://www.faqs.org/rfcs/rfc2396.html">RFC 2396</a>) is wanted.  A separate function is needed to
separate the path segments and parameters.  This function returns a
5-tuple: (addressing scheme, network location, path, query, fragment
identifier).

<p>
The return value is actually an instance of a subclass of
tuple.  This class has the following additional read-only
convenience attributes:

<p>
<div class="center"><table class="realtable">
  <thead>
    <tr>
      <th class="left"  >Attribute</th>
      <th class="center">Index</th>
      <th class="left"  >Value</th>
      <th class="center">Value if not present</th>
      </tr>
    </thead>
  <tbody>
    <tr><td class="left"   valign="baseline"><tt class="member">scheme</tt></td>
        <td class="center">0</td>
        <td class="left"  >URL scheme specifier</td>
        <td class="center">empty string</td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">netloc</tt></td>
        <td class="center">1</td>
        <td class="left"  >Network location part</td>
        <td class="center">empty string</td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">path</tt></td>
        <td class="center">2</td>
        <td class="left"  >Hierarchical path</td>
        <td class="center">empty string</td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">query</tt></td>
        <td class="center">3</td>
        <td class="left"  >Query component</td>
        <td class="center">empty string</td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">fragment</tt></td>
        <td class="center">4</td>
        <td class="left"  >Fragment identifier</td>
        <td class="center">empty string</td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">username</tt></td>
        <td class="center"> </td>
        <td class="left"  >User name</td>
        <td class="center"><tt class="constant">None</tt></td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">password</tt></td>
        <td class="center"> </td>
        <td class="left"  >Password</td>
        <td class="center"><tt class="constant">None</tt></td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">hostname</tt></td>
        <td class="center"> </td>
        <td class="left"  >Host name (lower case)</td>
        <td class="center"><tt class="constant">None</tt></td></tr>
    <tr><td class="left"   valign="baseline"><tt class="member">port</tt></td>
        <td class="center"> </td>
        <td class="left"  >Port number as integer, if present</td>
        <td class="center"><tt class="constant">None</tt></td></tr></tbody>
</table></div>

<p>
See section&nbsp;<a href="urlparse-result-object.html#urlparse-result-object">18.17.1</a>, ``Results of
<tt class="function">urlparse()</tt> and <tt class="function">urlsplit()</tt>,'' for more
information on the result object.

<p>

<span class="versionnote">New in version 2.2.</span>

<span class="versionnote">Changed in version 2.5:
Added attributes to return value.</span>

</dl>

<p>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
  <td><nobr><b><tt id='l2h-4283' xml:id='l2h-4283' class="function">urlunsplit</tt></b>(</nobr></td>
  <td><var>parts</var>)</td></tr></table></dt>
<dd>
Combine the elements of a tuple as returned by <tt class="function">urlsplit()</tt>
into a complete URL as a string.
The <var>parts</var> argument can be any five-item iterable.
This may result in a slightly different, but equivalent URL, if the
URL that was parsed originally had unnecessary delimiters (for example,
a ? with an empty query; the RFC states that these are equivalent).

<span class="versionnote">New in version 2.2.</span>

</dl>

<p>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
  <td><nobr><b><tt id='l2h-4284' xml:id='l2h-4284' class="function">urljoin</tt></b>(</nobr></td>
  <td><var>base, url</var><big>[</big><var>, allow_fragments</var><big>]</big><var></var>)</td></tr></table></dt>
<dd>
Construct a full (``absolute'') URL by combining a ``base URL''
(<var>base</var>) with another URL (<var>url</var>).  Informally, this
uses components of the base URL, in particular the addressing scheme,
the network location and (part of) the path, to provide missing
components in the relative URL.  For example:

<p>
<div class="verbatim"><pre>
&gt;&gt;&gt; from urlparse import urljoin
&gt;&gt;&gt; urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
'http://www.cwi.nl/%7Eguido/FAQ.html'
</pre></div>

<p>
The <var>allow_fragments</var> argument has the same meaning and default as
for <tt class="function">urlparse()</tt>.

<p>
<span class="note"><b class="label">Note:</b>
If <var>url</var> is an absolute URL (that is, starting with <code>//</code>
      or <code>scheme://</code>, the <var>url</var>'s host name and/or scheme
      will be present in the result.  For example:</span>

<p>
<div class="verbatim"><pre>
&gt;&gt;&gt; urljoin('http://www.cwi.nl/%7Eguido/Python.html',
...         '//www.python.org/%7Eguido')
'http://www.python.org/%7Eguido'
</pre></div>

<p>
If you do not want that behavior, preprocess
the <var>url</var> with <tt class="function">urlsplit()</tt> and <tt class="function">urlunsplit()</tt>,
removing possible <em>scheme</em> and <em>netloc</em> parts.
</dl>

<p>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
  <td><nobr><b><tt id='l2h-4285' xml:id='l2h-4285' class="function">urldefrag</tt></b>(</nobr></td>
  <td><var>url</var>)</td></tr></table></dt>
<dd>
If <var>url</var> contains a fragment identifier, returns a modified
version of <var>url</var> with no fragment identifier, and the fragment
identifier as a separate string.  If there is no fragment identifier
in <var>url</var>, returns <var>url</var> unmodified and an empty string.
</dl>

<p>
<div class="seealso">
  <p class="heading">See Also:</p>

  <dl compact="compact" class="seerfc">
    <dt><a href="http://www.faqs.org/rfcs/rfc1738.html"
        title="Uniform resource Locators (URL)"
        >RFC 1738, <em>Uniform Resource Locators (URL)</em></a>
    <dd>
        This specifies the formal syntax and semantics of absolute
        URLs.
  </dl>
  <dl compact="compact" class="seerfc">
    <dt><a href="http://www.faqs.org/rfcs/rfc1808.html"
        title="Relative uniform Resource Locators"
        >RFC 1808, <em>Relative Uniform Resource Locators</em></a>
    <dd>
        This Request For Comments includes the rules for joining an
        absolute and a relative URL, including a fair number of
        ``Abnormal Examples'' which govern the treatment of border
        cases.
  </dl>
  <dl compact="compact" class="seerfc">
    <dt><a href="http://www.faqs.org/rfcs/rfc2396.html"
        title="Uniform resource Identifiers (URI): generic Syntax"
        >RFC 2396, <em>Uniform Resource Identifiers (URI): Generic Syntax</em></a>
    <dd>
        Document describing the generic syntactic requirements for
        both Uniform Resource Names (URNs) and Uniform Resource
        Locators (URLs).
  </dl>
</div>

<p>

<p><br /></p><hr class='online-navigation' />
<div class='online-navigation'>
<!--Table of Child-Links-->
<a name="CHILD_LINKS"><strong>Subsections</strong></a>

<ul class="ChildLinks">
<li><a href="urlparse-result-object.html">18.17.1 Results of <tt class="function">urlparse()</tt> and <tt class="function">urlsplit()</tt></a>
</ul>
<!--End of Table of Child-Links-->
</div>

<div class="navigation">
<div class='online-navigation'>
<p></p><hr />
<table align="center" width="100%" cellpadding="0" cellspacing="2">
<tr>
<td class='online-navigation'><a rel="prev" title="18.16.1 Example"
  href="uuid-example.html"><img src='../icons/previous.png'
  border='0' height='32'  alt='Previous Page' width='32' /></a></td>
<td class='online-navigation'><a rel="parent" title="18. internet Protocols and"
  href="internet.html"><img src='../icons/up.png'
  border='0' height='32'  alt='Up one Level' width='32' /></a></td>
<td class='online-navigation'><a rel="next" title="18.17.1 results of urlparse()"
  href="urlparse-result-object.html"><img src='../icons/next.png'
  border='0' height='32'  alt='Next Page' width='32' /></a></td>
<td align="center" width="100%">Python Library Reference</td>
<td class='online-navigation'><a rel="contents" title="Table of Contents"
  href="contents.html"><img src='../icons/contents.png'
  border='0' height='32'  alt='Contents' width='32' /></a></td>
<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
  border='0' height='32'  alt='Module Index' width='32' /></a></td>
<td class='online-navigation'><a rel="index" title="Index"
  href="genindex.html"><img src='../icons/index.png'
  border='0' height='32'  alt='Index' width='32' /></a></td>
</tr></table>
<div class='online-navigation'>
<b class="navlabel">Previous:</b>
<a class="sectref" rel="prev" href="uuid-example.html">18.16.1 Example</a>
<b class="navlabel">Up:</b>
<a class="sectref" rel="parent" href="internet.html">18. Internet Protocols and</a>
<b class="navlabel">Next:</b>
<a class="sectref" rel="next" href="urlparse-result-object.html">18.17.1 Results of urlparse()</a>
</div>
</div>
<hr />
<span class="release-info">Release 2.5.1, documentation updated on 18th April, 2007.</span>
</div>
<!--End of Navigation Panel-->
<address>
See <i><a href="about.html">About this document...</a></i> for information on suggesting changes.
</address>
</body>
</html>
